[WIP] Refactor exec_place to unified grid model by andralex · Pull Request #8035 · NVIDIA/cccl

andralex · 2026-03-14T02:03:47Z

Summary

This PR refactors exec_place to use a unified grid model where all execution places are treated as grids:

Scalar places (host, single device, cuda_stream, green_ctx) are 1-element grids
Multi-device grids work as before

Key Changes

Added get_dims(), get_place(idx) to exec_place::impl interface
Changed activate()/deactivate() to take an index parameter (defaults to 0 for backward compatibility)
Moved set_current_place()/unset_current_place()/get_current_place() from exec_place_grid to base exec_place
Deprecated is_grid() in favor of size() > 1
Updated all client code (stream_task, graph_task, parallel_for_scope, launch, place_partition) to use the new interface

Benefits

Eliminates special-casing for grids vs non-grids
Allows uniform iteration over any exec_place
Cleaner API with consistent indexed access

Status

Work in Progress - needs build verification and testing.

Test plan

Build verification
Run existing CUDASTF tests
Verify no behavioral changes

Made with Cursor

copy-pr-bot · 2026-03-14T02:03:51Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

All execution places are now modeled as grids: - Scalar places (host, device) are 1-element grids - Multi-device grids remain as before Key changes: - Added get_dims(), get_place(idx) to exec_place::impl - Changed activate/deactivate to take index parameter - Moved set_current_place/unset_current_place to exec_place base - Deprecated is_grid() in favor of size() > 1 - Updated all client code to use new interface This eliminates special-casing for grids vs non-grids and allows uniform iteration over any exec_place. Made-with: Cursor

The shell class added no value - just use the dynamic interface directly and return exec_place from the factory methods. Made-with: Cursor

andralex · 2026-03-14T02:28:05Z

/ok to test 6e2df83

Same as exec_place_cuda_stream - the shell class adds no value. Use the dynamic interface directly and return exec_place from factory. Made-with: Cursor

- Remove meaningless const from return-by-value affine_data_place() - Remove virtual is_grid() from impl - just use size() > 1 directly Made-with: Cursor

Replace separate operator== and operator< virtual methods with a single cmp() method that returns -1/0/1, consistent with data_place_interface. Made-with: Cursor

With the unified grid model, 1-element grids have size()==1 but is_device()==false. Update place_partition to handle this case by extracting the underlying scalar place from 1-element grids. Made-with: Cursor

- Remove virtual is_host()/is_device() from impl; move logic to shell (base impl already returns correct values via affine data place) - Move grid iteration state (current_idx, saved_prev_impl) to grid impl since only multi-element grids use iteration - Add virtual accessors for grid state with assertions for misuse Made-with: Cursor

andralex · 2026-03-14T04:39:53Z

/ok to test 35c8060

The condition `is_grid()` was changed to `size() > 1` but this broke 1-element grids (e.g. `all_devices()` on single-GPU systems). Fix 1: Restore is_grid() to detect any grid by checking if affine_data_place() is invalid (only grids have invalid affine). Fix 2: Factory functions now collapse 1-element grids to scalars: - make_grid(), all_devices(), n_devices(), repeat() - partition_cyclic(), partition_tile() This ensures that by construction, any true grid has size() > 1, making `size() > 1` equivalent to `is_grid()` in practice. Benefits: - Simpler mental model: grids always have multiple elements - No edge cases for 1-element grids - Single-GPU `all_devices()` returns `device(0)` directly Return type changes (exec_place_grid -> exec_place): - all_devices(), n_devices(), repeat(), make_grid() - partition_cyclic(), partition_tile() - place_partition::as_grid() Made-with: Cursor

With 1-element grids now collapsed to scalars by factory functions, is_grid() is equivalent to size() > 1. Remove the method and use the simpler size check directly. Made-with: Cursor

andralex · 2026-03-16T20:08:23Z

/ok to test 2779e89

Update data_place::composite() to accept const exec_place& instead of const exec_place_grid&. This allows passing exec_place from factory functions that now return exec_place (like repeat(), all_devices()). Also update: - data_place_composite to store exec_place instead of exec_place_grid - get_grid() to return const exec_place& - localized_array constructor parameter - slice interface local variables Made-with: Cursor

andralex · 2026-03-16T22:34:40Z

/ok to test 383957a

exec_place_grid was a shell class that provided no additional value over exec_place now that all places support the unified grid model. This change: - Removes exec_place_grid class, keeps impl as exec_place_grid_impl - Updates make_grid() to return exec_place directly - Changes as_grid() to return const exec_place& (just returns *this) - Updates place_partition constructor to take const exec_place& - Updates loop_dispatch template parameter to exec_place - Removes deleted parallel_for(exec_place_grid, ...) overload - Updates C API (stf.cu) to use exec_place* instead of exec_place_grid* - Removes forward declarations of exec_place_grid The C API types (stf_exec_place_grid_handle) are unchanged as they are opaque void* handles that don't depend on the C++ class name. Made-with: Cursor

andralex · 2026-03-16T22:54:47Z

/ok to test 6534679

andralex · 2026-03-17T01:29:42Z

/ok to test 77589ae

github-actions · 2026-03-17T02:16:07Z

🥳 CI Workflow Results

🟩 Finished in 44m 50s: Pass: 100%/52 | Total: 19h 30m | Max: 37m 02s | Hits: 48%/24201

See results here.

github-project-automation bot added this to CCCL Mar 14, 2026

github-project-automation bot moved this to Todo in CCCL Mar 14, 2026

cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Mar 14, 2026

andralex force-pushed the refactor/exec-place-unified-grid branch from 74bf808 to dacd49e Compare March 14, 2026 02:06

Remove exec_place_cuda_stream shell class

6e2df83

The shell class added no value - just use the dynamic interface directly and return exec_place from the factory methods. Made-with: Cursor

Remove exec_place_green_ctx shell class

eb16cfe

Same as exec_place_cuda_stream - the shell class adds no value. Use the dynamic interface directly and return exec_place from factory. Made-with: Cursor

This comment has been minimized.

Sign in to view

andralex added 5 commits March 13, 2026 23:37

Clean up affine_data_place() return type and remove virtual is_grid()

a99fd06

- Remove meaningless const from return-by-value affine_data_place() - Remove virtual is_grid() from impl - just use size() > 1 directly Made-with: Cursor

Simplify exec_place comparison using three-way cmp()

f7d6c05

Replace separate operator== and operator< virtual methods with a single cmp() method that returns -1/0/1, consistent with data_place_interface. Made-with: Cursor

Fix place_partition for 1-element grids

86aab4d

With the unified grid model, 1-element grids have size()==1 but is_device()==false. Update place_partition to handle this case by extracting the underlying scalar place from 1-element grids. Made-with: Cursor

Merge branch 'main' into refactor/exec-place-unified-grid

35c8060

This comment has been minimized.

Sign in to view

andralex force-pushed the refactor/exec-place-unified-grid branch from 187f3d2 to 01fe359 Compare March 16, 2026 18:06

andralex force-pushed the refactor/exec-place-unified-grid branch from 01fe359 to 672cf33 Compare March 16, 2026 18:53

Remove is_grid() method - use size() > 1 instead

36ea11e

With 1-element grids now collapsed to scalars by factory functions, is_grid() is equivalent to size() > 1. Remove the method and use the simpler size check directly. Made-with: Cursor

andralex force-pushed the refactor/exec-place-unified-grid branch from 686889a to 36ea11e Compare March 16, 2026 19:55

Merge branch 'main' into refactor/exec-place-unified-grid

2779e89

This comment has been minimized.

Sign in to view

andralex added 2 commits March 16, 2026 21:28

Merge branch 'main' into refactor/exec-place-unified-grid

c6487fc

pre-commit

77589ae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Refactor exec_place to unified grid model#8035

[WIP] Refactor exec_place to unified grid model#8035
andralex wants to merge 15 commits intomainfrom
refactor/exec-place-unified-grid

andralex commented Mar 14, 2026

Uh oh!

copy-pr-bot bot commented Mar 14, 2026

Uh oh!

andralex commented Mar 14, 2026

Uh oh!

This comment has been minimized.

andralex commented Mar 14, 2026

Uh oh!

This comment has been minimized.

andralex commented Mar 16, 2026

Uh oh!

This comment has been minimized.

andralex commented Mar 16, 2026

Uh oh!

andralex commented Mar 16, 2026

Uh oh!

This comment has been minimized.

andralex commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andralex commented Mar 14, 2026

Summary

Key Changes

Benefits

Status

Test plan

Uh oh!

copy-pr-bot bot commented Mar 14, 2026

Uh oh!

andralex commented Mar 14, 2026

Uh oh!

This comment has been minimized.

andralex commented Mar 14, 2026

Uh oh!

This comment has been minimized.

andralex commented Mar 16, 2026

Uh oh!

This comment has been minimized.

andralex commented Mar 16, 2026

Uh oh!

andralex commented Mar 16, 2026

Uh oh!

This comment has been minimized.

andralex commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

🥳 CI Workflow Results

🟩 Finished in 44m 50s: Pass: 100%/52 | Total: 19h 30m | Max: 37m 02s | Hits: 48%/24201

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant